2023-10-30 14:41:15.AIbase.2.6k
Apple and Columbia University Join Forces to Develop the Ferret Multimodal Language Model
Researchers from Apple and Columbia University have developed the Ferret multimodal language model, aimed at achieving advanced image understanding and description. The Ferret model possesses strong global understanding capabilities, able to handle free text and referenced regions, outperforming traditional models. The researchers created the GRIT dataset, which includes 1.1 million samples, to guide the model in reference and localization tasks. Ferret-Bench evaluations show that Ferret outperforms the best MLLM models by an average of 20.4% while reducing object-related errors.